Applicants: 

Assignee: 

Title: 

Serial No.: 
Examiner: 
Docket No,: 



In the IJnitfo States Patent and Trademark Officb 
Mark D. Matson; Bruce E. Edwards 
Broadcom Corporation 

MAC Controlled Sleep Mode/Wake-up Mode with Staged Wake-up for 
Power Management Devices 



10/810,094 
Andrew Wendell 
BP 3197 



Filed: March 26, 2004 

Group Art Unit: 2618 
Customer No.: 34399 



Board of Patent Appeals and Interferences 
United States Ptitcnt and Trademark Office 
P.O. Box 1450 
Alexandria, VA 22313-1450 

APPEAL BRIEF UNDER 37 CFR S 41.37 

Dear Sir: 

Applicants submit this Appeal Brief pursuant to the Notice of Panel Decision mailed in 
this case on December 7, 2007. The fee for this Appeal Brief is being paid via the USP TO-EFS. 
The Board is also authorized to deduct any other amounts required for this appeal brief and to 
credit any amounts overpaid to Deposit Account No, 502264, 

I. REAL PARTY IN INTEREST - 37 CFR § 41.37(c)(na) 

1 he real party in interest is the assignee, Broadcom Corporation, as named in the caption 
above and as evidenced by the assignment set forth at Reel 01 5 1 59, Frame 0938. 

II. RELATED APPEALS AND INTERFERENCES - 37 CFR § 41.37(c)n)fii) 

Based on information and belief, there are no appeals or interferences that could directly 
affect or be directly affected by or have a bearing on the decision by the Board of Patent Appeals 
and Interferences in the pending appeal. Pursuant to current Patent Ofllce practice, Appendix 
"A" contains copies of all decisions rendered by a court or the Board in this "Related Appeals 
and Interferences" section, and is intentionally provided as an empty appendix. 

III. STATUS OF CLAIMS - 37 CFR § 41.37rc)f IKiii) 

Claims 1-20 are pending in the application. Claims 1-20 stand rejected, i'hc rejection of 
claims 1-20 is appealed. Appendix ^'B" contains the full set of pending claims. 



Austin, 'fexas 
January 7, 2008 



S/N: 10/810,094 



IV. STATUS OF AMENDMENTS - 37 CFR S 41.37(c)ri)(iv) 

On September 6, 2007, Applicants filed a Response to Final Office Action requesting that 
claims 11.19 and 20 be amended. In an Advisor)' Action dated September 2 K 2007. the 
Examiner declined to enter the requested amendments. 

V. SUMMARY OF CLAIMED SUBJEC T MATTER - 37 CFR § 41.37(c)(lKv) 

The subject matter defined in independent claim 1 may be understood with reference to 
the example embodiment depicted in Figures 2-4 which depict the claimed data processor for use 
in a wireless communication device. To comply with 37 CFR § 41 .37{c)( 1 )(v), a color-coded 
comparison of independent claim ] (including reference characters) and the relevant portions of 
the figures are set forth below. 

As shown in Figures 2 and 3, a data processor for use in a wireless communication device 
includes a processing unit (e.g., processing module 51 and/or a wireless interface device having a 
MAC module implemented with 
communication processor 100). The 
depicted data processor also includes 
an instruction pipeline circuit 140, 
along w ith one or more processing 
modules (e.g., modules in the 
transmit/receive queues and 
supporting hardware 1 82, digital 
receiver processing module 64, digital 
transmitter processing module 76, or 
any of the interface modules 66, 68, 
70, 72, 71. 73. 76, 78, 80, 82, 84, or 
85). Finally, the data processor also 
includes jMl iiiHHIMIMHliM ft 
i^^l^H^and 
power control logic 172. 




Figure 3 



S/N: 10/810,094 



In operation and as depicted with reference to Figure 4. the power control logic 1 72 
detects a sleep instruction 405 and 
places the processing unit 102. 
instruction pipeline circuit 140 
and at least one processing 
module (e.g.. 64. 76, etc.) in a 
low-power state 406, where the 
power control logic 1 72 is 
operative in response to a wake- 
up signal 407 to reactivate the 
instruction pipeline circuit 140. 
and consequently at least one 
processing module only to the 
extent required by the wake-up signal. 

In further compliance with 37 CFR § 41 .37(c)(l )(v), a color-coded comparison of 
selected Figures from the application and each of the pending independent claims is attached at 
Appendix "C" to provide a concise explanation of the subject matter defined in each independent 
claim. The subject matter of the independent claims is set forth in the specification at pages 3-15 
(paragraphs 8-42). 

VI. CRQI NDS OF REJECTION TO BE REVIEWED ON APPEAL - 37 CFR S 
41.37(c)(lK vi ) 

In the final Office Action dated .Tune 6, 2007, the Examiner rejected claims 11-18 under 
35 U.S.C. § 101, but indicated that the rejection could be overcome by including a "computer 
readable medium encoded with a computer program'" limitation in the claims. In addition, 
claims 1. 2. 4-6. 8, 10-11, and 13-14 and 16-18 were rejected as anticipated by U.S. Patent No. 

6,473,607 to Shohara el al.; claim 3 was rejecleJ as obvious over Shohara in view of U.S. Patent 
Publication No. 2003/0028677 to Fukuhara; claims 7. 12. and 19-20 were rejected as obvious 
over Shohara in view of U.S. Patent No. 6,622,251 to Lindskog et al.; and claims 9 and 15 were 
rejected as obvious over Shohara in view of U.S. Patent Publication No. 2002/0059434 to 
Karaoguz et al. On September 6, 2007. Applicants filed a Response to Final Office Action 
requesting that claim 1 1 be amended as suggested by the Examiner in the Final Office Action, 




S/N: 1()./8I().{)94 



and that claims 19-20 to clarify the ■Mnslruction pipeline" requirement, thereby presenting the 
claim in better form for consideration on appeal. In an Advisory Action dated September 21 , 
2007, the Examiner declined to enter the requested amendments. 

As explained below. Applicants traverse these rejections because (1) the Applicants" 
amendment to claims 1 1-18 to overcome the rejection under 35 U.S.C. § 101 as suggested by the 
Examiner was improperly denied, and (2) none of the cited art references meet the "instruction 
pipeline'" limitation variously recited in the claims. In particular. Applicants appeal the 
following grounds of rejection from the Final Ofiice Action dated June 6, 2007: 

( 1 ) The rejection of claims 11-18 under 35 U.S.C. § 1 01 as directed to non-statutory 
subject matter; 

(2) The rejection of claims 1.2,4-6,8, 10, 11, and 13-14 and 16-1 8 under 35 U.S.C. 
102(b) as being anticipated by U.S. Patent No. 6,473.607 to Shohara el al. 
("Shohara""); 

(3) 'I'hc rejection of claim 3 under 35 U.S.C. § 103(a) as being unpatentable over 
Shohara in view of U.S. Patent Publication No. 2003/0028677 to Fukuhara 
("Fukuhara"); 

(4) The rejection of claims 7. 12 and 19-20 under 35 U.S.C. § 103(a) as being 
unpatentable over Shohara in view of U.S. Patent No. 6,622.251 to Lindskog et al. 
("Lindskog"); and 

(5) The rejection of claims 9 and 15 were rejected under 35 U.S.C. § 103(a) as 
unpatentable over Shohara in view of U.S. Patent Publication No. 2002/0059434 
to Karaoguz ci al. ('"Karaoguz"'). 

For purposes of organizing the issues in this appeal, the appeal issues are discussed in 

three groups: (A) the statutory subject matter rejection of claims 11-18 listed above as appeal 

issue l;and(B)thc anticipation rejections of claims 1,2,4-6, 8. 10, 11. and 13-14and 16-18 

listed above as issue 2; and (C) the obviousness rejections of claims 3, 7. 9. 12. 15 and 19-20 

listed above as issues 3-5. 

VII. ARGUMENT - 37 CFR S 41.37(c)riMvii) 

A. Amended Claims 11-18 Recited Statutor\ Subject Matter 

Applicants appeal the statutory subject matter rejection of claims 11-18 because 
Applicants have claimed the invention to recite statutory subject matter as suggested by the 
Examiner, in particular, while claims 11-18 were rejected under 35 U.S.C. § 101, the Examiner 
explicitly stated that "Note, "computer readable medium encoded with a computer program' 
would make the claim statutory."" See. Final Office Action , p. 3. In response. Applicants 
amended the claims as suggested by the Examiner, as seen below w ith the proposed amendment: 



-4- 



S/N: 10/810,094 



1 1 . (Currently Amended) An article of manufacture having at least one computer 
readable medium encoded with a computer prouram comprisinti r e cordabl e 
m e dium having stored thereon executable instructions and data which, when 
executed by at least one processing device, cause the at least one processing 
device to: 

detect a sleep instruction for the processing device; 
specify one or more wake-up conditions and a time-out inter\'al: 
power down an instruction pipeline and one or more processor modules: 
reactivate the instruction pipeline upon detection of a wake-up signal 

corresponding to either a wake-up condition or the time-out 

interval, and 

process one or more instructions in the instruction pipeline to reactivate 
any of the one or more processor modules required to respond to a 
detected wake-up condition. 

Response to Final OtTice Action . (September 6, 2007). Because this amendment was submitted 

to comply with a requirement of form set forth in the previous OHlce Action, Applicants submit 

that the amendment was permitted under 35 CFR § 1.1 16(b)(1). and therefore respectfully 

request that the statutory subject matter rejection of claims 11-18 under 35 IJ.S.C. § 101 be 

withdrawn and that the amended claims be allowed. 

B. Claims 1,2, 4-6, 8, 10, 1 K 13-14 and 16-18 Are Not Anticipated by Shohara 

In response to the first office action rejection of claims i . 2, 4-6, 8, 10, 11. 13-14 and 1 6- 

18 under 35 U.S.C. § 102 as being anticipated by Shohara. Applicants explained that Shohara 

fails to disclose an "instruction pipeline circuit." much less Applicants' scheme for using 

detected sleep instructions and wakeup signals to selectively power down and reactivate 

processing modules in the instruction pipeline circuit only to the extent required by the sleep 

instruction and the wake-up signal. See, Response After Non-Final Office Action , pp. 6-8 

(March 19. 2007). fo underscore this deficienc) . Applicants pointed out that the word 

"pipeline" never appears in Shohara. a fact that was not disputed by the Examiner. In the Final 

Oiflce Action, the Examiner responds to this deficiency by proposing an unreasonably broad and 

wholly unsupported definition of "instruction pipeline" and then as.serting that Shohara meets 

this overbroad definition. In particular, the Examiner asserts that "any circuit that carries out 

instructions (i.e. control information) to other components through channels (i.e. connections) 

can be considered an instruction pipeline given the broadest responsible (sic, reasonable) 

interpretation." Final Office Action , p. 9 (.lune 6. 2007). With all due respect, this is simply not 

a reasonable interpretation of the "instruction pipeline" term. As explained more fully below. 



-5- 



S/N: 10/810.094 



when the proper interpretation of the "instruction pipeline" term is used, Shohara simply does 
not meet the requirements of the claims. 

1. Correct Interpretation of "Instruction Pipeline" 

According to the MPEP Guidelines, the pending claims must be "given their broadest 

reasonable interpretation consistent with the specification" during patent examination. See, 

MPF-P §2111. This was confirmed with the Federal Circuit statement that: 

The Patent and Trademark Office ("PTO") determines the scope of claims in patent 
applications not solely on the basis of the claim language, but upon giving claims their 
broadest reasonable construction " in li^ht of the speeilicalion as it would be interpreted 
by one ofordinarv skill in the art ." In re Am. Acud. of Sci. Tech. Cir.. 367 F.3d 1359. 
1364[. 70 USPQ2d 1827] (Fed. Cir. 2004). Indeed, the rules of the P I O require that 
application claims must "conform to the invention as set forth in the remainder of the 
specification and the terms and phrases used in the claims must find clear support or 
antecedent basis in the description so that the meaning of the terms in the claims may be 
ascertainable by reference to the description." 37 CFR 1 .75(d)(1). 

Phillips V. AWIJCorp., 415 F.3d 1303. 1316, 75 USPQ2d 1321, 1329 (Fed. Cir. 2005). Thus, 
"the broadest reasonable interpretation of the claims must also be consistent with the 
interpretation that those skilled in the art would reach." In re Conrighl. 1 65 F.3d 1 353. 1359. 49 
USPQ2d 1464. 1468 (Fed. Cir. 1999). "fhis means that the words of the claim must be given 
their plain meaning unless the plain meaning is inconsistent with the specification." In re Zletz, 
893 F.2d 319, 321. 13 USPQ2d 1320. 1322 (Fed. Cir. 1989). "The ordinary and customary 
meaning of a term may be evidenced by a variety of sources, including 'the words of the claims 
themselves, the remainder of the specification, the prosecution historv', and extrinsic evidence 
concerning relevant scientific principles, the meaning of technical terms, and the state of the 
art.'" MPEP § 2 111 .0 1 . ciling Phillips v. A WII Corp. . 4 1 5 F.3d at 1 3 1 4. 75 U SPQ2d at 1 327. 

Based on the foregoing. Applicants submit that the "instruction pipeline" refers to a 
processing structure that separates the execution of instructions into multiple stages (e.g.. 
instruction fetch, instruction decode and operand read, execution, and write), and executes 
separate instructions in each stage simultaneously, thereby allowing multiple instructions to be 
executed concurrently. As depicted in the application, each stage simultaneously processes its 
assigned set of signals, and then forwards the results of the processing to the next stage and 
receives from the prior stage the results of the prior stage's processing. The resulting overlap of 
operations by each stage increases the overall throughput of tlie processor structure. Applicants' 
proposed interpretation is consistent with the specification as it would be interpreted by one of 



-6- 



S/N: 10/810.094 



ordinar>' skill in the art. See. e.g.. Application, I-igure 3 C'instruction pipeline circuit 140") and 

paragraphs 30, 32, 33. 35-38 and 41 . See. MPEP § 21 \\. citing In re Am. Acad ofSci. Tech. 

Or.. 367 I'.3d 1359. 1364[, 70 USPQ2d 1827] (Fed. Cir. 2004). In addition. Applicants" 

proposed interpretation is consistent with the "ordinary and customary meaning" as evidenced by 

substantial extrinsic evidence previously submitted, including Wjkipedia"s definition of 

"Instruction Pipeline": .1. Silc ct al.. Processor Architecture , pp. 18-20, 349 and 354 (1999) 

("instruction pipeline"); D. Patterson et al.. Computer Architecture: A Quantitative Approach . 

pp. 251-252 (1990) ("pipelining"); W. Rosch, The Winn L. Rosch Hardware Bible. 1 hird 

Edition , pp. 44-45 ( 1 994) ("pipelining"); Microsoft Computer Dictionary p. 367 (3rd ed. 1 997) 

("pipelining"); and H. Mcssmer. The Indispensable PC Hardware Book , pp. 216-218, 1239 and 

1250-1251 (4th ed. 2002) ("instruction pipelining"). If there is any reputable extrinsic evidence 

that supports the Examiner's proposed interpretation here. Applicants would request that it be 

provided. However, based on Applicants' review. Applicants' proposed interpretation is 

consistent with the specification and with the submitted extrinsic evidence definitions which 

better rellect how one of ordinary skill in the art would interpret the "instruction pipeline" term. 

2. Shohara Docs Not Meet The Properly Interpreted "Instruction 
Pipeline" Requirement 

As explained above, the broadest reasonable interpretation of the "instruction pipeline" 

term that is consistent with the specification (and confirmed by the extrinsic evidence) refers to a 

processing structure that separates the execution of instructions into multiple stages (e.g.. 

instruction fetch, instruction decode and operand read, execution, and write), and executes 

separate instructions in each stage simultaneously, thereby allowing multiple instructions to be 

executed concurrently. This requirement is simply not met by Shohara's disclosure ofusing an 

"event scheduler" to control reception of "scheduled intermittent messages" with a dual mode 

timer that uses different clock signals to power down all idle components during a scheduled 

power save sleep mode. Accordingly, Shohara does not anticipate the present invention's 

claimed scheme detecting a "sleep instruction" which is used to place the processing unit. 

instruction pipeline circuit and at least one processing module in a low-power slate, and then 

reactivating the instruction pipeline circuit to the extent required by a received "wake-up signal." 

Sec. e.g.. claims 1 and 1 1 . Accordingly. Applicants respectfully request that the anticipation 

rejection of claims 1,2. 4-6, 8, 10. 11, and 13-14 and 16-18 be withdrawn and that the claims be 

allowed. 



-7- 



S/N: 10/810.094 



C. Claims 3. 7, 9, 12, 15 and 19-20 Are Not Obvious 

In response to the Examiner's obviousness rejections of claims 3, 7. 9. 12, 15 and 19-20, 
Applicants submit that a prima facie case of obviousness has not been established showing that 
all the claim limitations are taught or suggested by the prior art. In re Royka . 490 F.2d 981. 180 
IJSPQ 580 {CCPA 1974). First of all, claims 3. 7. 9. 12, 15 and 20 each include an "instruction 
pipeline" rcquircmenl (either expressly or by virtue of their dependency from claims 1 and 11)'. 
As explained above, the broadest reasonable interpretation of the "instruction pipeline" term that 
is consistent with the specification (and confirmed by the extrinsic evidence) refers to a 
processing structure that separates the execution of instructions into multiple stages (e.g.. 
instruction fetch, instruction decode and operand read, execution, and write), and executes 
separate instructions in each stage simultaneously, thereby allowing multiple instructions to be 
executed concurrently. This requirement is simply not met by Shohara's disclosure, either alone 
or in combination with the other cited references, of using an "event scheduler" to control 
reception of "scheduled intermittent messages" with a dual mode timer that uses different clock 
signals to power down all idle components during a scheduled power save sleep mode. 

In addition. Applicants submit that the cited references do not disclose or suggest 
Applicants' claimed invention for controlling power in a communications processor by 
responding to "sleep instructions" and "wake-up signals" to selectively reactivate only the 
processor modules in the instruction pipeline circuit that are required to respond to the delected 
wake-up signal. Shohara"s failure to disclose an "instruction pipeline circuit" is not remedied by 
the Fukuhara, f indskog or Karaoguz references, none of which refer to a "pipeline circuit." Nor 
do the cited references, alone or in combination, disclose or suggest a communication processor 
which goes to sleep at arbitrary times (via the sleep instruction) and which can be awakened by 
external events (via wake-up signals) which can occur at any time. Nor do the cited references 
disclose that the individual modules in the instruction pipeline circuit can be reactivated only to 
the extent required by the wakc-up instruction. Indeed, the Shohara disclosure seems to directly 
contradict the idea of selective power-up. See. Shohara Patent, col. 12. lines 32-56. On this 
point, Shohara is quite clear that the Shohara controller "powers down ah idle components of the 

' While the E.\aminer denied Applicants' after tlnai amendments to claims 19-20 (adding the "iiistrLiction 
pipeline circuit" limitation). Applicants submit that the "instruction pipeline circuif ' limitation was 
included in the original claim 20. If it is determined that this limitation is not disclosed in the prior art. 
Applicants reserve the right to amend claim 19 to incorporate the limitations of claim 20. 



-8- 



S.TsJ: 10/810,094 



device beiween message receptions in a power saving sleep mode to conserve battery power. 
During active mode when the device is fully active in reception of messages the timer uses a 
reference oscillator with a relatively high frequency to support digital processing by the 
receiver."' Shohara Patent, col. 10. lines 39-44. When, as here, the Shohara reference teaches 
away from the claimed invention, a prima facie case of obviousness has been rebutted. See. 
MPHP § 2144.05(111) ("A prima facie case of obviousness may also be rcbuUed by showing that 
the art. in any material respect, teaches away from the claimed invention. In re Geisler , 1 16 F.3d 
1465, 1471. 43 L;SP02d 1362, 1366 (Fed. Cir. 1997). ..,•■), Based on the foregoing. Applicants 
request that the obviousness rejection of claims 3. 7, 9, 12, 15 and 19-20 be withdrawn and that 
the claims be allow ctl. 

Vin. CLAIMS APPEN DIX - 37 CFR 41.37(c)(l)(viii) 

A copy of the pending claims involved in the appeal is attached as Appendix "B." 
IX. EVIDENCE APPENDIX - 37 CFR $ 41.37(cKI)(ix) 

Applicants are not submiuing any evidence pursuant to 37 CFR 1.130. 1 . 1 3 1 . or 1 . 1 32 
of this title, and based on information and belief the Examiner is not submitting any evidence 
that Applicants will be relying upon (assuming that the office actions and cited references 
applied by the Examiner in the Final Office Action for this case do not qualify as "evidence 
entered by the examiner'). As for the above-referenced dictionary and other extrinsic evidence 
relating to the proper interpretation of the claim term '"instruction pipeline" (attached at H.\hibit 
A in Appendix D), this evidence w as submitted as part of Applicants' Response To Final Office 
Action . Though Applicants submit that the information in the enclosed references reflects what 
was known to those of ordinary skill in the art. these materials are nevertheless being presented 
out of an abundance of caution. \\\ support of the Rule 1.11 6(e) submission requirements. 
Applicants state that this information is necessary to demonstrate to the Examiner the proper 
interpretation of the "instruction pipeline" term. Applicants also stated that this information was 
not earlier presented because the Hxaminer first asserted to the proposed definition of 
"instruction pipeline" ( to mean '"any circuit that carries out instructions (i.e. control information) 
to other components through channels (i.e. connections) can be considered an instruction 
pipeline given the broadest responsible (sic. reasonable) interpretation'') in the Final Office 
,A.ction , dated .lune 6, 2007. While it is not clear from the Advisory Action if the Examiner 
considered the Rule 1 16(e) information. Applicants submit that the properly submitted evidence 



-9- 



S/N: 10/810,094 



should be fully considered here. Pursuant to current Patent Office practice, Appendix "D" 
contains copies of all evidence identified in this "Evidence Appendix" section. 

X. RELATED PROCEEDINGS APPENDIX - 37 CFR S 41.37fc)fnrx^ 

There are no related proceedings. 

XI. CONCLLSION 

For ihc reasons set forth above. Applicants respectfully submit that the statutory subject 
matter rejection of claims 11-18 has been overcome, that the cited Shohara reference does not 
anticipate claims 1 , 2, 4-6, 8, 1 0, 11 , 1 3- 1 4 and 1 6- 1 8, and that the cited art combination does not 
make obvious claims 3, 7, 9, 12, 15 and 19-20. Accordingly, Applicants respectfully submit that 
rejection of pending claims 1-20 is unfounded, and requests that the rejections of claims 1-20 be 
reversed. 



FII.HD l-:i.l-XTROiNlCAl.LY 
Jajiuary 7. 2()()« 



Respectfully submitted, 

/Michael Rocco Cannafli/ 

Michael Rocco Cannatti 
Attorney for Applicants 
Reg. No. 34,791 



- 10- 



S/N: 10/810,094 



APPENDIX A - RELATED APPEALS AND INTERFERENCES 

There are no decisions rendered by a court or the Board in any related proceeding. 



S/N: 10/8!0,0Q4 



APPENDIX B 



1 1 . (Original) A data processor for use in a wireless communication device. 

2 comprising: 

3 a processing unit; 

4 an instruction pipeline circuit: 

5 at least one processing module; 

6 a timer for generating a time-out interval; and 

7 power control logic for detecting a sleep instruction and placing the processing unit. 

8 instruction pipeline circuit and at least one processing module in a low-power state, where 

9 the power control logic is operative in response to a wake-up signal to reactivate the 

1 0 instruction pipeline circuit, and consequently at least one processing module only to the 

1 1 extent required by the wakc-up signal. 

1 2. (Original) The processor of claim 1 . where the instruction pipeline circuit 

2 comprises a multi-stage instruction pipeline circuit. 

1 3. (Original) The processor of claim 1, where the wake-up signal comprises 

2 a logical OR combination of one or more predetermined wake-up conditions and the time-out 

3 interval. 

1 4. (Original) fhe processor of claim 1, where the power control logic 

2 comprises instruction decode logic to detect the sleep instruction. 

1 5. (Original ) The processor of claim 1 . where the power control logic 

2 comprises branch condition logic to respond to the wake-up signal. 

1 6. (Original) The processor of claim 1. where the power control logic. 

2 having specified one or more wake-up conditions that the processing unit will respond to 

3 when in a low-power state, generates the wake-up signal upon detecting the one or more 

4 wake-up conditions or the time-out interval. 



S/N: 10/810.094 



1 7. (Original) The processor of claim 1 . where the power control logic 

2 instructs the instruction pipeline circuit to complete any instructions preceding the sleep 

3 instruction. 

1 8. (Original ) The processor of claim 7, where the power control logic 

2 instructs the instruction pipeline circuit to cease fetching new instructions after encountering 

3 a sleep instruction whose wake-up conditions are currently deasserlcd. 

1 9. (Original) The processor of claim 1. wherein the processing unit. 

2 instruction pipeline circuit and at least one processing module are formed together on a 

3 common silicon substrate using CMOS processing. 

1 1 0. (Original) The processor of claim 6, wherein the wake-up conditions and 

2 time-out interval are stored in a register by the power control logic. 

1 11. (Original) An article of manufacture having at least one recordable 

2 medium having stored thereon executable instructions and data which, when executed by at 

3 least one processing device, cause the at least one processing device to: 

4 detect a sleep instruction for the processing device; 

5 specify one or more wake-up conditions and a time-out interval; 

6 power down an instruction pipeline and one or more processor modules; 

7 reactivate the instruction pipeline upon detection of a wake-up signal corresponding 

8 to either a wake-up condition or the time-out interval, and 

9 process one or more instructions in the instruction pipeline to reactivate any of the 
1 0 one or more processor modules required to respond to a detected wake-up condition. 

1 12. (Original) I he article of manufacture of claim 1 1, wherein the processing 

2 device executes any instructions received by the instruction pipeline before the sleep 

3 instruction is received. 



S/N: 10/810.094 



1 13. (Original) The article of manufacture of claim 1 1, wherein tiie instruction 

2 pipeline comprises a multistage instruction pipeline, and the processing device reactivates 

3 only stages in the multistage instruction pipeline and/or the function units needed to process 

4 one or more instructions necessary to analyze and respond to the wakc-up signal. 

1 14. (Original) The article of manufacture of claim 11. further comprising a 

2 register for holding the specified wake-up conditions and time out signal. 

1 15. (Original) The article of manufacture of claim 1 1 , where the processing 

2 device is implemented as part of a single-chip wireless communication device. 

1 \ 6. (Original) The article of manufacture of claim 1 1 . where the executable 

2 instructions and data comprise control logic for controlling the operation of the processing 

3 device. 

1 1 7. (Original) The article of manufacture of claim 1 1 , where the processing 

2 device powers down the one or more processor modules by freezing a clock signal for said 

3 one or more modules. 

1 18. (Original) The article of manufacture of claim 1 1 , where the processing 

2 device powers down the one or more processor modules by placing said one or more 

3 modules in an idle mode. 

1 1 9, (Original) A method for managing power in a communications processor 

2 by selectively removing one or more processor modules from a standby mode, comprising: 

3 storing one or more wake-up conditions and a time-out interval in a register; 

4 receiving a processor sleep instruction: 

5 executing any pending instructions received by the processor before the sleep 

6 instruction; 

7 powering down the one or more processor modules; 



-3- 



S/N: 10/810.094 



8 receiving a processor wake-up signal corresponding to one of said wake-up 

9 conditions or said time-out interval; 

1 0 powering up only the processor modules required to respond to the detected processor 

1 1 wake-up signal. 

1 20. (Original) The method of claim 19, wherein one of the processor 

2 modules comprises an instruction pipeline circuit. 



-4- 



S/N: 10/810,094 





ft ? -= 



1 ?il 



O V 

s o 



^ 3 
i: « s 
g ^ o 

2 o « 

£ «l 

.s .s s 



5-1 

i^ Si So 

_ Da 



<^ S 



^ O 



APPKNDIX D - EVIDENCE APPENDIX - 37 CFR S 41.37fc)n)nx) 



Exhibit A - Extrinsic Evidence Regarding 
Proper Interpretation of "Instruction Pipeline" 



Help IIS improve H'lkipalia hy supportinfi it finaiuially 

Instruction pipeline 



I rom W ikipcdia. llic Ircc ciicyclo(icdia 

l'i,nc!mi,iii rejirecls here. For 1111 P /vix'linin;:;. vei' /// Tl' pipelininii. 

An instruction pipeline is a lechniqui.- used in ihc design of unnpulcrs and other digital 

' ' IF ID EX MEM 

IF ID EX " _ _ WB 

IF ID MEM WB 

IF EX MEM WB 

[ ID EX MEM WB 

Basic li\c-suigc pipeline iii a RISC' machine (!!■ - In^itriiction 1-eloh. 
ID= liislriicHon Decode, l-X = l Aociito. MLM Mcmiirv access. 
WB - Register wrilc hack) 

clccironic devices lo increase tlieir insirueiion ihmnghpul (the numher of insiriiciinns that can be executed in a unit ol iime). 

Pipclinini; assumes iliat successive instructions in a pnn;ruin \cijHciicc \i ill uvcrian in cxecuiion. as suggested in the next dias^ram (veriicai '1' instruciions. 
Iiori/ontal '1' time). 

Most modern CPt 'S are driven b>' a clock. I he t l'i; consists iniernall> ol louic and flip Hops. When the clock arrives, ilie flip Hops lake iheir new value and the 
logic then requires a period of lime to decode ihe new values. Then Ihe ne.vt clock pulse arrives and the Hip flops again take their new values, and so on. B\ 
breaking ilie logic into smaller pieces and inserting Hip Hops between the pieces of logic, the delav before the logic gives valid outputs is reduced. In this way 
clock period can be reduced, for example, the KiSC pipeline is broken into five stages vvhh a set of flip flops between each stage. 

1. Instruction fetch 

2. Insiruciion decode and register fetch 

4. Memorv access 

5. Register w rite back 

Hazards: When a programmer (or compiler) writes asscmblv code, they make the a,ssumplion thai each insiruciion is executed before e\cciiii(in of the 
subsequent instruction is begun. This assumption is invalidated b> pipelining. When this causes a program lo behave incorrectly, ihe siUiaiion is known as a 
hazard. Various techniques for resolving hazards such as forwarding and stalling e.xist. 

.A non-pipeline archileclure is inel'licieni because some Cl'i; components (modules) are idle while another module is active during Ihe iiistruclion cycle. 
]>ipelining does not complelely cancel out idle time in a CPl! but making those modules work in parallel improves program e\eculinn signilicanllv 

I'rocessors with pipelining are organised inside into stages which can semi-independenlly work on separate jobs, l.ach stage is organised and linked into a 'chain' 
so each stage's output is inpulled to another stage until the job is done, "f his organ isation of the processor allows overall processing lime lo be signilicantiv 

I iiforlunateh not all instructions are independent. In a simple pipeline, completing an mstrucnon md\ require 3 stages. 1 o operate at tiili performance, this 
pipeline will need lo run 4 subsequent independent instructions while the lirsl is completing. II 4 msiruciions ihal do not depend or Ihe output ol the lirst 
insiruciion are not available, the pipeline control logic must insert a stall or wasted clock cycle inlo the pipeline unlil the depeiidencv is resolved. l orlunateK . 
lecliniques such as forwarding can signineanilv reduce the cases where stalling is required. W hile pipelining can in theory mcieasc petforniance over an 
unpipelined core bv a factor of ihe number of stages (assuminc the clock Irequenev also scales with the number ol stages), in reality, most code does not allow 
lor ideal execulion. 



Contents 

■ I ,Ad\aniages and Uisadvantages 

■ 2.1 (Icneric pipeline 

■ 2.1.1 Hubble 

■ 2.2 l-xample 1 

■ C omplications 

■ 4 See also 

■ 5 l-ixlernal [.inks 



Advantages and Disadvantages 

Pipelining does not help in all cases, there are several disadvantages associated. An instruction pipeline is said lo be fully pipelincil if il can accept a new 
instruction everv clock cycle. A pipeline that is not titlly pipelined has wail cycles thai delay the progress ofihc pipeline. 



Advantages of pipelining: 

1 . The cycli; lime olThc processor is reduecd. thus increasing instruction bandwidth in most cases. 
Advantages of not pipelining 

I I he processor e\ccLites onK a siimle instruction ai a time. This prevents branch dela s s (In effect, every branch is delayed) and problems with serial 
insu uclions being executed concurrenli\ . tOnsemientIs the desitin is siinplcr and chea per to nianufiicturc. 

2, I he instruclion latency in a non-pipelined processor is slightl> lower llian in a pipe lined equivalem. This is due to the fuel that extra Hip Hops must be 
added to the data path ol' a pipelined processor. 

i. A non-pipelined processor will have a stable instruclion bandw idth. The pertbrmance of a pipelined processor is much harder to predict and may \ary 
more widely between dilTerent programs. 

Examples 
Generic pipeline 

1 o the right is a generic 4-slage pipeline with four stages: 

1. I-etch 

2. Decode 

4. Write-back 



Cieneric 4-slai;e pipeline; the colored boxes rcprcseril inslruelioiis 
independent of each other 




inxx[><:cx: 

1. IXBBBDXX 



I he top gray box is the list of instructions waiting to be executed: the bottom gray box is the list of instructions that have been eompleled: and the middle while 
box is the pipeline. 



Kxecution 

he e\cculed 



■ the green instruction is decoded 



■ the green instrtiction is executed (actual operation is perloritied) 

■ the purple inslruclion is decoded 

■ the blue instruction is fetched 



■ the green instrtiction's results are written back to the register llle or memory 

■ the purple Instruction is executed 

■ the blue instrtiction is decoded 

■ the red instruclion is fetched 



■ the given instruction is complclcd 

■ ihc puTfik insliuclion is wrillcn back 

■ the bkic instruction is executed 

■ the red instruction is decoded 



■ "I he purple instrtiction is completed 
6 ■ the blue instruction is written back 

■ the red instruction is cvecutcd 



■ the blue instruction is completed 

■ the red instruction is written back 



8 ■ the red instruction is completed 

y All instructions are executed 



Bubble 




When a "hiccup" in execution occurs, a "bubble" is created in the pipeline in uhicli nothing useful happens. In cycle 2. the fetching oi'the purple instruction is 
dcla\ed and the decoding stage in cycle 3 now contains a bubble. i-Acrything "behind" the purple instruction is delayed as well but everything "ahead" of the 

ClearU . when compared to the execution above the bubble yields a total execution tim e 01'*^ cycles instead of 8. 

Bubbles are unlike stalls, in which nothing useful w ill happen for the fetch, decode. e\ccule and writeback. It can be completed w ith a nop code. 
Example 1 

A t\pical instruction to add two numbers niiglit be adj a, b, c. whicli adds the \alues found in memory locations A and 15. and then puts the result in menior\ 
location f. In a pipelined processor the pipeline controller would break this into a scries of instructions similar to: 



The R locations are registers, temporary memory inside the CPU that is quick to access. The end result is the same, the numbers iire added and the result placed 
in L. and the time taken to drive the addition to completion is no different from the non-pipelined case. 



I he ke\ to understanding the advantage ofpipelining is to consider what happens m hen this ADD function is "half-way done", at the ADD instruction for 
instance At this point the eircuitrv responsible for loading data from memorj- is no longer being used, and would normallv sit idle. In this ease the pipeline 
controller letches the next instruction from momorv', and starts loading the data it needs into registers. That way when the ADD instruction is complete, the data 



iifL'dod lor the ne\t ADD is nlready loaded and ready lo go. I hc ov erall etTcctl\e spec d orihe rniiehine can be jireailv Increased because no parts of the CI 

idle. 

I ach of the simple .steps are usiiallv culleu pipeline stnges. in the example abcne ihe pipeline is three stages long, a loader, an adder, and a suner. 

l Aer> microprocessor iiiamjliietiired today uses at least 2 stages of pipeline. (The .Atmel AVR and the IMC microcontroller each have a 2 stage pipeline). 

Example 2 

To better \ isuali/e the concept, we can look at a theoretical 3-slages pipeline: 

Stage Description 

Load Read instruction from memor> 

l ACCute Hxeeute instruction 

Store Store result in memory and/or registers 

and a pseudo-code asseiiiliK' listing to be executed: 



This is how it would be executed; 
Clock I 

Load Execute Store 

LO.AD 



I he I.O.AD instruction is fetched from memor> . 

Clock 2 

Load Execute Store 
MOVH LOAD 



I he LOAD instruction is executed, while the VlOVi; inslructii 
Clock 3 

Load Execute Store 

ADD MOVI- LOAD 



I lie I ,( instrticiion is in the Store stage, where its result (the number 40) will be stored in the register A. In the meantime, the MOVH instrtietion is being 
exeeiiled. Since it must mo\ e the contents of A into H. it must wait for the ending of the LOAD instruction. 

Clock 4 

Load Execute Store 
SfOki; ADD MOV1-; 



I he S I ORE instruction is loaded, while the MOVI- instruction is tinishing off and the ADD is ealctilating. 

And so on. Note thai, sometimes, tin instruction will depend on the result of another one (like our MOVI- example). When more than one instruction references a 
particular location for an operand, either readitig it (as an input) or writing it (as an output), executing those instructions in an order different froni the original 
program order can lead to hazards 

(tnentioned above). I here are several established techniques for either preveniitig hazards from occurring, or working around them il"the> do. 



Complications 



Vlurn dcsians include pipelines as loni; us 7. ll);iiideven 31 stages (like in the Intel Pentium 4). Ilic Xelcniior XI(A| lias ;i pipeline mure than a thousand stages 
lonsi'l 1 1 (hup:'/\vxvv\aiKlronliiiexu)m'wateh\\;i;eh_issiie.asp'.'Valnanie--lssiie ^"o23 1 7 1 &on- l#ilem 13 ) . I he downside of a long pipeline is when a prouraiii 
hraiiehes. the entire pipeline must he Hushed, a problem that branch predieling 

helps 10 alleviate. Branch predictiiig itself can end up exacerbating the problem if branches are predicted poorly. In certain applications, such as supcrcompuiing. 
prosirams arc specially written to rarely branch and so very long pipelines are Ideal to speed up the compulations, as long pipelines arc designed lo reduce clocks 
per'instmetion (CI'D. Branching happens constantly, re-ordering branches such thtil the more likel\ lo be needed instructions are placed into the pipeline can 
sianificantK reducing the speed losses associated with having to tlush tailed branches. Programs such as gcov can be used to examine how often particular 
branches are actually executed using a technique known as coverage analysis, however such analysis is often a last-resort for optimisation. 

I he higher througlipiil of pipelines falls short when the executed code contains nian> bra.nches: the processi)r cannot know where lo read the next instruction, 
and musl wan for the branch instruction to finish, leaving the pipeline behind it em p1\. , After the branch is resolved, the next instruction has lo travel all the \\a.\' 
throuuh the pipeline before its result becomes available and the processor appears to "work" again. In the extreme case, the performance of a pipelined processor 
could'theoretically approach that of an unpipelined processor, or even slightly worse ifall but one pipeline stages are idle and a small overhead is present 
between stages. 

Because of the instruction pipeline, code that the processor loads will not immediately execute. Due to this, updates in the code ver\ near the current location ot 
execution may not take effect because they are already loaded into the Prefetch input Queue. Instruction caches make this phenomenon even worse. I his is onK 
relevant to self-modifying programs. 



■ Wait state 

I classic RISC" pipeline 

■ Pipeline (computer) 
I Parallel computing 

I Branch Prediction in ihc Pentium Family (http:.'/x86.org/articles/branch/branchprcdiction.htm) 



External Links 

• Ars l echnica article on pipelining (hltp://arstechnica.com/artieles/paedia,'cpii/pipelining-l 
Retrieved from "hnp://en.wikipedia.org./wiki/Instruction pipeline" 
Category: Instruction processing 



■ I his page was last mtxHfied 13:20. 31 .luly 2007. 

■ All text is available under the terms of the GM) Tree Documentation 1, icen.se. (See (opyrights for details.) 

WikipediaK. is a registered tradcmtirk of the Wikimedia Foundation. Inc.. a U.S. registered .iO I (c)(3) tax-deductible nonprollt charitv. 



Jurij Silc • Boriit Robic • Theo Ungerer 



Processor Architecture 

Prom Dataflow to Superscalar and Beyond 



^ Springer 



Dr, Junj Sik- 

Computer Systems. Deparlmcnl 
Jozcf Stefan Insiiuite 
Jamova 39 

SMOO! l.jubljaii.'i, Slnvonia 

Assistant Prolcssor 
i)r, Borut Robic 

Faculty of Computer and Information Science 



SI- 1001 l.jubljann. SU 
Theo Ur 



t of Coil 
CnnersjiY of Karlj 
P.O. Box 6')S0 
D- 7612S Karlsruht 



ISBN .^-S4()-6^7y.S-S S 



IS 1. Ii.-isi<- l'if.f-!iijin:<, an. I Simple RIHC: Pio. 



. In. U, ,<,t, ,11 1 r., .n t- ii. M . t . t , t . ' . m.lnli.m I, o! .uJ. hufh , 

(11, m for virMKil I.. pliVHcal |k.rc ad.lns^ iraiislation. I 1r' ILB is urga 
iii^.-i Hv -1 fiillv -tsvc^rintivv TH. ■![.-' aii.f usii;.llv cont.-un> ;i2 U) 2.-jI) ciilrios. 
\,, ,1,^ ih ! 1 H T il- i mj III I hiiK <\<1< )i l<- 

• Sitpj. .11 a I'.aL'liiE; up rhaiiisiii mv.plv..l iii the virtna! iiictiiory ..rj;ani-/..i- 
ti.iM r- r ilh- sr-.^jiK-iji.ai itjii inixluunsin (if iiiiiilciiR-iile.i) aiuJ for uicmoi v 
l,r..l,.H-tluU. 

riir MM L' .-w:..■.•l,^, IS ...! uViM-]a,pp.'i I wit.li t.ii<- r^et lo. a( loii dur\i\g . a. a.,. 

a >-a.-lir nr,va..izrilii.n llifil. is called viMuallv ii..l<-x..,l, .pliyMcally (.a;-.-.!. 
Oth.iui^e tta MMl' . in l>r .t^ri. I.. Iwr- ili. . icli. .... ^ {^ . < .Hi. I 

;./u/s(. ■.///!/ inUnss,<l ..:a<-|u') or alt,.T .-a.:!!.' a.'.^osj; in the ch.'.c oI a . arh.> nn,->s 

<ll d tiitnalh/ add}< ^( / i< Ik t /Viyw. / 7'y ' < li ^ r <piii th 
, a.-lR- t.i |.<- .•.,i,M>.'r-d with (,h.^ pllvsl<^al a.idrrss from the MMl.'. In sn.di 
fiivironm-i,i^ .-a.iir- fnis^ .ifl.rh.ni inav !'•• r, lH.ttktic.;k m the MMI \ A 
virtnallv iu,/,,,-,! ra.'Iir uses viTlual ad.inss.-s wh.ai a 1 t(-rtii>titiR tu lnhl (.hf 
r. .juir. .1 V . r I Ml Ih .ill. Ih.l i-,t sij nih. «iit irt . I t h. v irt u il id.ir ss 
IS us..i lo i..tss hn f th. .id), f.hn t imjjped) th it in i\ .ciHiiri 

th> r..iinr. i \ rd 1 h. mo-.i l !ilh< ml p irt .il tli. \ir1nil i idr. ^ i ih.a 
( iiip>r 1 vMtlj tli^ t I i.lr. 1 Its (i,r i possii,}. iiiiUh ..r . i. 5ie liil [his 
s,hii„ iiMin di it hr misses ir. <iiu. Uv ,ht,fl<<l uidth.l il.lr.sses 

h'-.r til. .re .hiails ,,n .-arht-s and MMl. orf,amzation. sro IIhuk.^si/ and 
i'ntU rson \ and ^hrivcr and Snnlli [-m]. 



1.5 Basic PipplijK' Stages 

Oik: of tlie major features of mo.lern procpssors (espenally lUSC i)roressors) 
is til..- u>< .d a pip. ]ni<-..l iristrii. tioti ex.xulioi> lo achieve an avera-e ( dd <'los.- 
to .1. ripeliiiiuff is an iiiipl.:-]iieiitatKiii te. litiujuf:' wliereby mi]lta[)l<; iristruc.\ ions 
ar." overlapped iii execution It is nol. vjsihlo to I he programmer. F,a<-li si.ep is 
call.'d a I'lpi slii<;, or fnf siipnvnt. Pipeline stages are separafetl by e.lork«l 
pipiluu- nfiittivs (also e.allecj latrht.^,). .\ pipclim machinF ryr!e is the time 
r.-ijiured re. nie-ve an instrurtioii one stej) down t.he pipeluR-. 

I.ieally, in a k-Miujc pipfline an in.st rii. l i. m is executed m k cycks by k 
f:Tap;.'s. [f instni.-t.ioii fet.'hiiig into the- pup. line ,:.jnliiui..s, tii<-ii at any tirn.^' 
■ assununp; uieal .-ou.litK.ns ' k mst.rii.M i...n,s will b,; han.llcd sin,„l(.aneo„sjy 
and it uall'tak,- k .-v.-les f..r ea.'h i n.^t.ni.t i. ei to l<-;.vr the pipeline. We .iefi.ie 
lulrn:,i :„ b.- the tirii..- nredrd f< T an iiislruetion to j-af^.s t,|-iroui;h all 

/,- sfa-. s „r It... pip. Iin.e The Umm<,hpHt of lli<> plp. linf is .iel'iii.-d to be 
th, iiiinib. i . f .insirii. li.>ii.s tlial. ran I.^ave a pipelin,- p<T ryele. rat'- 
ri'll.cls Ih.' cmpniinn p-owcr ..f a ]->i]i.^li!ie In .■ontra.st \.o tlie ?i t k ry.des 
on a liyp..tli,:tieal ik.ii pipeline,! pr. .<-oss,.,r, l he ex.:,-ntion of n iriwi rii...1 i.aiis 
on a A- sta-.: pipHiiio will lake k + n - 1 .-yck-s (a.ssiimiiij; ideal condil ioii.s 



|.,,„,„.v rvc k-s nn.l Ihn-u-l.put 1). ll<-ti<v, n-;iilt.iti;; .s,-. 
I' f.:^,. „ ^ 1) - /./(AV» t- I - l/'O- IM.Iu- n.unluT ot nu-noris Miat- 
IS iKxl I" !lK- i« minute, th.-^ii til.- r'siiliinc, sprrcliip c.pKiis Uu- 



t.-i<rcs shown in hiL'- ()vorla.|)i><<l 
1 ;,r,->r:<- pipfliiic. 1 lie pipolmi.' csocu 




mlihI. i k % 1 Lh. lU^t r. t.h J.rssiih i smt:! \ 1 x utionl-r 
„ioM it,sti-M.-;,(.ri-, Kiirh a pipeiin.' ran hr lound m thr 1)1, X'* fJISC of II<u- 
r„.s,i and Pall^rson [VM\ and i]i thr MITS R30nO pror.-s-or ( S.d, . 1 .7.;Vi . 

prnfi ssorM m .sciuf innlt.imc'lia procpssors. 

Vvjuir 1-1 shows 1,1, (■ Ivisir s(a.j,c.sof l.h,- i.istr.Kiion pip. lme m mure .ir tail 
Pip. |lt[.- stawv; ,-uc l.uir,-r.',l hv <liir. rrt.t, pipcliiK- r<.i;iL::t.;i y. 

• -\ ril ;';ov;<((;j foKd/fr tf ;?s/os (PC ) iti th< IF st u b< I a* < ii ( Ik I F,l D 
r,ivl b.'twfoii tli'^ l[)/fc;X sf.n-es, 

• i.lu: n!.s7ni< //on botwcT. th-^ 1 i''/ 1 [ ) ;^t.-ip;f.s. 

• ill.' ALU uii'id yt.qisU IS I and an.:! tlic i»Hi»r,/iiWr r.^/i-^/o- b.-tvv..rii 
n). KX siay;.:s. 

. tin- ..mdihamd t<<iisl,r, llir M,]' onipal K'lishr. ami tlir .s7..,v. r.///n /<.,. 
i.fo I).;twfvr! tlK' I'AAll'iM slrucrs. aii.[ 

• til.- /rvi,/ mnnori/ data r,-(/}.-tn- anvl /I/,/- /¥.*m// n.p.sf.T b.-uv.-ii iIk- 
MbM \VH slanr.s. 

Dunn- ■iisl.nirliuii cxr cut mn tin- 1. .Ilmvi n-j; s.-< iiwii.-o .-il s(r!>f-. is p-rf. inn- . !: 
l)!.X ii,i,.,K,urK<,t --Dduv, 1 is;iMnipl.' k,.i<l ■sl-.tc ai<-l,it,( n „iv. 



Glossary 349 



II 

Harvard architecture - a compuier d«sigti fi-aiure wlioro. there aro iwo 
s<.u)urat,<- mfmory units: one for instructions and the oilier for da.ta. 



1-cachfi - a cache thai only holds the instructions of a. program (not data), 
l-cachcrt generally do not need a write policy. 

in-order issue the yiluation in which iuslructions arc sent to bo executed 
111 the same order as they appear in l.h<; program. 

instruction decoder unit - the module thnt rtceives an instruction from 
the instruction fetch unit, identifies the type of iiistnict ion from the opcode, 
assembles the complete instruction with its operands, and sends the instruc- 
t.ion to the appropriate functional unit, or to an instruction pool to await 
execution. 

instruf tion ff^tch unit - the module that fetches instructions from meni- 
<>rv usu.-illy in conjunction with a bti& interface unit, and prepares tlieiii for 
subscciuent decoding and execution bv one or more functional units. If an 
1 I h- 1-, fMsient lit n-lru tniis HrM(.l(liMl Ir ) 1 I ^ he 
in-itructioii format ilie spi.'ci hcotion oi the iiurnljcr and size of all pos- 
sible mstrnction helds m an instruction set architecture, 
instruction issuo ■ the act ot initiating the perforniance oi an irisiruciion 
(not its fetch). Issue; pohcies ar(^ important design decisions in. systems that 
US'- parallelism and execution out ol proq,raiii order to achieve more speed. 

instruction-level parallohsm (ILP) - the concept of executing two or 
niiirr iikstriiclions m paTall<',l (ncnerallv instructions taken from a -ieciaential, 

not parallel, stream of instructions! , 

iiisU-uctK.in pipidino - a structure that ,M;i)aratcs ihc execution ol iiialtuc 
ucuis into iiniltiple plui.scs. and executes separan- iiistruclions iij ouch jjhasc 
siuuiltaiicoiislv. 

instruction r<:t)rdoring ■ a technique iii which the ( PI, executes in- 
struct ions in an order different troni that specified bv the pro2;ram. with the 

instruction sclicduling the relocation of independent instructions m 
rdtr to iiiiximtzc instrn tionl \ IpiruJeljsm ai 1 cr unn/ insiritM i 
stalls). 

instruction set the collection of all the rnacdiiiie-laiiguagir instructions 
available lo the programmer. 



I >j t-eiiiTsti^ N cDiilH I li [ii-,trii n. iL wl\ Ti mollis It I i(h -,1 igt fur 
I lu^l ^"\ ni 1 si 1 ( 1 . exciitn;-, I HI \v 111 trii ti II IS pre\u.us 

I u I I II I 1,,^ mill ' iliUi t u r i 111! 1 \s 111 tni fic n ran 
olipu h^i2; •! iJuniiG, <»vTrv cvcle. Pipcliucii grt.;at.lv impruvc the rate at. wliidi in- 
vti n, tv .ii^ .-,'1 !,<• . vf nitPtl a.'; Icng rus tlierf^ are no ciepfncl<;iu:»:s. Ihe efltcK;iit, 
usr' ()( .1 pipeline r( (|iiin s thai 8t.-vrTal instruot.iotis ]ir exnt-ulcd in f>n.r;ill<:l, 
linwever the r<>siilt ot auv iiist.riK-tioii is not availalilp for sevoral cyclrs afU:r 
t>i;it. iiistriu tion h;is etiteroii t\i<- pipolme. 1 hus, new inslrin-tions luusit not, 
c'epeiid . ill tlif rostiliH uf in.stTiictioiis which are still ui tli*- pipeline. 

piprliut' rfj'f ;*!' TiAte tlx:: miinbur of cvclcs ilifu occur between i.lic i.s- 
vii;\ni e nf uiie mstrurtioii iiud the issiumce of the next uistriietion to tlic same 



piprline tlirougfiput, tin; iiuiiibec of iiistrilctionK that can le.-ive a pipeline 
per .-v'^l^- 

pipoliniiiK splitting the Cl'l.' into a number of stages, vviiieh allows mul 
liple instructions to be exeeiiied eoacuireutly. 

pop instruction an iiistructir>ii tliat retrieves uoiiicnts from the t<>i) of 
tlif: stack and [places the contents iti a specified register. 

postiiicrcuKuitaticjii an addressing mode in which t.lic aildr^^s is iiu'rc- 
niented after a<-cossin}i; the riioniory value. Used to access el<-tnciits of arrays 
111 nieiiiory. 

prerisfi interrupts an impieir.- nration of ihe inti^rnipl mech.anlMii such 
that tlie processor can restart after rhr- mt.errnpt at exacily wlierc it \v;i.s 
iiil<:rruptcd. AIL instructions thai have srart.ed prior to the interrupt, shoLild 
appear toiiave completed bcfure the interrupt takes phace and all instructions 
after the interrupt should not ai>pear tcj start until after tlic iiitcrnipt routine 
has finished. 

pr*;<lerr(;moiitation an addrcssiiip; mode using an index or address regis- 
tor in which the contents of the address are reduced by the size of the operand 
before the access is attempted. 

pr«-iIictiori (of brfuichos) the act (.>f .giicssinu; the hkely outcome of a 
ronditional bran<,;h decision. Prediction is an iius-iartaiit iccliiiiqne i<jr spc'cd 
int; up execution in ov.,Tluppcd proets.sor dcsisris, Incrcwinij; tlie deptli of tii< 
prediction (the iiuinbev of braiicli prcdictioits that r-.m be imre.-;cj| vod at any 
liitie) increases both the complexity uin,l speed. 



Computer 
Architecture 
A 

Quantitative 
Approach 



David A. Patterson 

UN'IVF.RSITY OF CALIFORNIA AT Bf-RKELEY 

John L. Hennessy 

STANFORD UNIVBRSITY 



With a Contribution by 

David Goldberg 
Xerox Palo Alto Research Center 



MORGAN KAUFMANN PUBLISHtiRS, INC. 
SAN MA'l'IiO, CALIFORNIA 



Sponsoring Editor Bnice Spatz 
Production Manager Shirley Jowell 
Technical Writer Walker Cunningham 
Text Design Gai'y Head 
Cover Design David I.ance Goines 
Copy Editor Linda Medoff 
Proofreader Paul Medoff 

Computer Typesetting und Graphics Fifth Street Computer Services 



Library of Congress Cataloging in-Puhlicaiion Data 
Patterson, David A. 

Computer architecture ; a quantitative approach / David A. 
Patterson, John L. Hennessy 
p. cm. 

Includes bibliographical references 
ISBN 1-55860-069-8 

1 . Computer architecture. L Hennessy. John L. II. Title. 

QA76,9.A73P377 1990 

OO4.2-2--dc20 89-85227 

CIP 

Morgan Kaufmann Publisher?. Inc. 

Editorial Office: 2929 Campu.s Drive, San Mateo, CA 94403 
Order from: P.O. Box 50490, Palo Aho, CA 94303-9953 

©1990 by Morgan Kaufmann Publishers, Inc. 
.\\\ rights reser\'ed. 

No part of this publication may be reproduced, stored in a retrieval system, or transmitted 
in any fonn or by any means — electronic, mechanical, recording, or otherwise — without 
the pnor permission of the publisher. 

All msiraclion sets and other design information of the DLX computer system contained 
herein is copyrighted by the publisher and may not be incorporated in other publications 
or disuibutcd by media without formal acknowledgement and written consent from the 
publisher. Use of the DLX in other publications for educational purposes is encouraged 
and application for permission is welcomed. 

ADVICE, PRAT.SR, & ERRORS: Any correspondence related to this publication or 
intended for the authors .should be addressed to the editorial offices of Morgan Kaufmann 
Publishers, Inc., Dept. P&H APE. Information regarding error sightings is encouraged. 
Any error sightings that are accepted for correction in subsequent printings will be 
rewarded by the authors with a payment of $!.«) (U.S.) per correction upon availability 
of the new printing. Iilectronic mail can be sent to bugs3@ vsop.stanford.edu. (I'lease 
include your full name and permanent mailing address.) 

IN.STRLlCTf )R SLIPPORT: h'or information on classroom software and other instructor 
materials available to adopters, pk-ase contact the editorial offices of Morgan Kaufmann 

Publishers, Inc. (415) 578-9911. 



Fourth Printing 



Pipelining 



What Is Pipelining? 

Pipelinini; is an implemeniation lechnique whereby multiple instructions are 
overlapped in execution. Today, pipelining is the key implementation technique 
uscU lo make fast CPUs. 

A pipeline is like an assembly line; Each step in the pipeline completes a part 
of the mstruction. As in a car assembly line, the work to be done in an instruc- 
tion is broken into smaller pieces, each of which takes a fraction of the time 
needed lo complete the entire instruction. Each of these steps is called a pipe 
Mage or a pipe segment. The stages are connected one to the next to form a 
pipe — instructions enter at one end, are processed through the stages, and exit at 
the other end. 

The throughput of the pipeline is determined by how often an instruction 
exits the pipeline. Because the pipe stages are hooked together, all the stages 
must be ready to proceed at the same time. The time required between moving 
an instruction one step down the pipeline is a machine cycle. The length of a 
machine cycle is determined by the time required for ihe slowest pipe stage 
(because all stages proceed at the same time). Often the machine cycle is one 
clock cycle {sometimes it is two, or rarely more), though the clock may have 
multiple phases. 



6.1 What ts Pipelining? 



The pipeline designer's goal is to balance the length of the pipeline stages. If 
the stages arc perfectly balanced, then the lime per instruction on the pipelined 
machine — assumiiig idea! conditions (i.e., no stalls) — is equal to 
■Time per instru ction on non pipel ined machine 
Number of pipe stages 

Under these conditions, the speedup from pipelining equals the number of pipe 
stages. Usually, however, the stages will not be perfectly balanced; furthermore, 
pipelining does involve some overhead. Thu.s, the time per instruction on the 
pipelined machine will rot have its minimum possible value, though it can be 
close (say within 10%). 

Pipelining yields a reduction in the average execution time per instruction. 
This reduction can be obtained by decreasing the clock cycle time of the 
pipelined machine or by decreasing the number of clock cycles per instruction, 
or by both. Typically, the biggest impact is in the number of clock cycles per 
instruction, though the clock cycle is often shorter in a pipelined machine 
(especially in pipelined supercomputers). In the advanced pipelining sections of 
this chapter we will see how deep pipelines can be used to both decrease the 
ckx-k cycle and maintain a low CPI. 

Pipelining is an implementation technique that exploits parallelism among the 
instructions in a .sequential instruction .stream. It has the substantial advantage 
that, unlike some speedup techniques (sec Chapters 7 and 10), it is not visible to 
the programmer. In this chapter we will first cover the concept of pipelining 
using DLX and a simplified version of its pipeline. We will then look at the 
problems pipelining introduces and the performance attainable under typical sit- 
uations. Later in the chapter wc will examine advanced techniques that can be 
used to overcome the difficulties that are encountered in pipelineil machines and 
that may lower the performance attainable from pipelining. 

Wc use DLX largely because its simplicity makes it easy to demonstrate the 
principles of pipelining. The same principles apply to more complex instruction 
sets, though the corresponding pipelines are more complex. We will see an 
example of such a pipeline in the Putting It All Together section. 



6i2 I The Basic Pipeline for DLX 

Remember that in Chapter 5 (.Section 5.3) we di.scusscd how DLX could be im- 
plemented with five basic execution steps: 

1. IT— instruction fetch 

2. ID instruction dccoile and register fetch 

3. HX execution and effective address calculation 

4. MCM — memor>' access 

5. WB— writeback 



The Winn L. Rosch 
Hardware Bible^ 
Third Edition 



Winn L. Rosch 



sAms 

PUBLISHING 

201 West 103rd Street 
adianapolis, Indiani 4fi290 



Copyright © 1994 by Sams Publishing 

AH rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or 
transmitted by any means, electronic, mechanical, photocopying, recording, or oilietwise, 
without written permission from the publisher. No patent Jiability is assumed with respect to 
the use of the information contained herein. Although every precaution has been taken in the 
preparation of this bool;, the publisher and author assume no responsibility for erroi^ or 
omissions. Neither is any liability assumed for damages resulting from the use of the informa- 
tion contained herein. For information, address Sams Publishing, 201 W. 103rd St., 
Indianapolis, IN 46290. 

International .Standard Book Number: 1-56686-127-6 
Library of Congress Catalog Card Number: 93 07:>014 
97 96 95 10 9 8 7 6 5 

Interpretation of the printing code: the rightmost doublc-digir number is the year of the book's 
printing; the rightmost single-digit, the number of the book's printing. For example, a printing 
code of 94-1 show.? that the first printing of the book occurred in 1 994. 

Composed in Aguramond and MCPdigital by Macmillan Computer Puhiishing 

Primed in the United States of America 

Trademarks: All terms mentioned in this book that are known to be trademarks or ser\'ice 
marks have been appropriately capitalized. Sams Publishing cannot attest to the accuracy of this 
infotmation. Use of a term in this book shottld not be regarded as affecting the validity of any 
trademark or service mark. 



The Winn L Rosch Hardware BihU. Third Edition 



The Arithmetic/Logic Unit 

The nrirhmetic/loi;ic unit handles all chc decision making (the mathematical computations and 
logic fur.ctions) [h,u are performed by the microprocessor. The unit taJ<es the instructions 
decoded by the control unit and Cither carnes them our directly or executes the appropriarc 
mitTocodc lo modify the data conta.incd in its registers. The results are passed back out of the 
microprote.s.sor tliroujjh the I/O uiiir. 

Bccau.se higher clock speeds make circuit boards and integrated circuits more difficult to design 
and manufacture, engineers have a strong incentive to get their microprocessors to process more 
instructions at a given speed. Most modern microproces.sor design techniques are aimed at 
exactly that. 

One way to speed up the execution of mstructions is to reduce the number of internal steps the 
microproce.ssor must t;ikc for execution. Step reduction can take two forms: making the micro- 
proces.sor more complex so that steps can be combined or by making the instructions simpler so 
that fewer steps are required. Both approaches have been used successfully by microprocessor 
designers — the former as QISC microprocessors, the latter as RISC. 

Another w.iy of trimming cycles required by programs is to operate on more than one instruction 
siniult.uieousiy. Two approaches to processing more instructions at once are pipelining and 
supt-rstaiar architecture. 

Pipelining 

In older microprocessor designs, a chip works single-mindcdiy. Ir reads an instruction from 
memory, carries it out step by step, and then advances to the next instruction. Pipclming enables 
a microprocessor to read an instruction, start to process it, and then, before finishing with dre 
first instruction, read another instruction. Because evety instruction requires several steps each in 
a different part of the chip, several instructions can be worked on at once, and passed along 
througli the chip like a bucket brigade. Intel's Pentium chips, for example, have four levels of 
pipelining. So up to four different instructions may be undergoing different pha.ses of execution 
at the same lime inside the chip. 

Pipelining is very powerful, but it is also demanding. The pif>eline nuisc be carelully organized, 
and the parallel paths kept carefully in step. It's like a chorus singing a cinon like Frere Jacques — 
onr ini.ssed lu-.u and the harmony fnlls .ip.irr. If one of the execution streams delays, all the rest 
delay as well. The demands of pipelining are one factor pushing microprocessor designers to 
make all instructions execute in the same number of clock cycles. Keeping the pipeline in step is 
easier this way. 

In general, the more stages to a pipeline, the greater atcekralion it caii oflei. But real world 
program.s conspire against lengthy pipelines. Nearly all programs branch. That is, their execution 
can take alternate paths down different instruction streams depending on the results of 



Cha/^tt-r 3: Micruprm man 



i-alcuhiLii)iis and dccisioivniikinj^',, A pijiclinc can lo.iJ up w ith instructitjns ot one program 
bruiioli before It d,sLovc-is tl.at another brand, is iIk one ihe pro-raiu is suppused lo follow. In 
ili.it luse, the entne a-tnrenrs of rhe pij-'ehne musr he (lumped, and the whole thing lo.KkJ up 
again. I'hc result is n lor of lot'jtal whe-el-spinning and wasted time. TIk- hipgcr die pipeline, (he 
more linic wasted. I'lu- wane re'sulting from branching begins to outweit;h llie bcnefit.s of bigger 
pipelines in the viciniiv ofdive stages. 

Tod.iy's most powertui iru ropro^.essois are adopim:: .1 technology ealled hr.nu h j^n-du Uon hpc 
U) deal with thrs problem. The rnieroproccssor makes its best guess at which branch a program 
will take as it i.s rilling up tlic pipeline. Sucli guesses ire good enotigh to niake pipelines of live, 
six, and seven srages beneficial ro overall performance. 

Superscalar Architectures 

The steps in a program normally are liste<l stqueiitially but they don't aKv;iys need to he carried 
out cxatdy in order. Just as lough problenw can be broken into easier pieces, program code can 
be divided as well. If, for example, you want to know the larger of two rooms, you need to 
compute the vuluine of each, and then make your tomparison. If you had two braias, you could 
compute the two volumes simultaneously. A superscalar microprocessor design does e.sseari.iliy 
thai. By providing two or more execution paths ior programs, it can process two or more 
program parts siniulianeoiisly. Of course, the chip needs enough innate intelligence to determine 
which problems can he split up and how to do it. The Pentium, for example, has two ()arallel, 
pipelined execution paths. 

'i he first super.scalar conipiirer design was die C xinirol Data Corporation 6600 mainframe, 
introduced in I96'l. Designed speeifically for intense scientific applications, the initial 6600 
tnachhies were built from eight fuiictionaJ units and were the fastest computers in the world at 
the time of their introduction. 

.Superscalar architecture gets its name because it goes beyond the incremental increase in speed 
made possible by scaling down tiiicioproccssor lechnology. 'An improvement to the scale ot a 
microprocessor design would reduce the size of rhe microcircuiiry 011 the silicon chip. The si/e 
reduction shortens the distance signals must travel and lowers the amount of hear gcner.ircd by 
tlic circuit (because the elements are smaller and need less current to effect changes). Some 
microprocessor designs lend tlicmselves ro scaling down. .SuperscaJar dcsign,s get a more substan- 
tial performance increase by incorporating a more dramatic change in ciiciiit coiiiple.tiiy. 

Using pipelining and siipejscalar an hirecturc cycle-saving techniques has cut the number of 
c7cles reijtiired for tlie execution of a ry[)ical microprocessor instruction dramatically, liarly 
niicro(>ri>eessors needed, on average, several cycles for each instrtu;tion. M.T.ny of tod.iy's i:lups 
(both CLSC and RISC) .icrually have average msiniccinn throughputs of less riian one cycle per 



Microsoft Press 

Coiii|»iiter 
IliefiiMijiry 

Third Edition 



Microsofi'Press 



PUBLISHKD BY 
Microsoft Press 

A Division of Microsoft Corporation 

One Microsoft Way 

Redmond. Washington 98052-639*) 

Copyright © 1997 by Microsoft Corpomiion 

All rights reserved. No pari of ihe contents of this book may be reproduced or transmitted 
in any form or by .iny means without the written permission of the publisher. 

Library of Congress CataJoging-in-Publication Data 
Microsoft Press Computer Dictionary. -- 2rd td. 
p. cm. 
I.SBN 1-5723 1-446-X 

I. Computers-Dictionaries. 2. Microcomputers-Dictionaries. 
1 Microsoft Press. 
QA76.LSM54 1997 

0()4'.03-de2] 97-l.'i4K9 

CIP 

Printed and bound in the United Stiitcs of America. 

5 6 7 8 9 QMQM 2 10 9 8 

Distributed to the hwk trade in Canada by Macmillan of Canada, a division of Canada Publishing 
Corporation, 

A CIP catalogue record for this book is avaihible from the British Library. 

Microsoft Press books are available through booksellers and distributors worldwide. For further 
information about international editions, contact your local Microsoft Corporation t>frice. Oi ctinlacl 
Microsoft Press lr,t:rnaiionaI directly at fax {425) 936-7329. 

Macintosh, Power Macintosh, QuickTime, and TrueType are registered trademarks of Apple Computer, 
Inc Intel is a registered trademark of Intel Corporation. Directlnpul, DirectX. Microsoft, Microsoft 
Press, M.S-DOS, Visij.'il Basic, Visual C+-I-, Wm32, Win32s, Windows, Windows NT, and XI-NIX are 
registered trademarks and ActiveMovie, ActiveX, and Visual J+-t- are trademark.s of Micro.soft 
('orporalion. Java is a trademark of Sun Microsystems, Inc. Oilier product and company names 
mentioned herein may be the tr.idetnarks of their respective owners. 

Acquisitions Editor: Kim Fryer 

Project Editor: Maureen William.? Zimmerman, Anne Taussig 

Technical Editors: Dail Magee Jr.. Gary Nelson, Jean Ross, Jim L'uchs, Jo!in Conrow, Kurt Meyer, 
Robert Lynn, Ro.iilyn Lut.sch 



pipe 



pLxcl map 




Pin grid array, life pin grid array on the 
bach of a Penlium chip. 

pipe \pi\'^\ n. 1. A portion of memory that can bo 
used by one pix jccss to pass information along to 
another. F.s.scriiially, n pipe work-s like its nnmc- 
sakf: it coiinect.s two processes s<i ihat the output 
of one can fjc used as t))e input to the- othtir. .Set- 
ako inpm strtMm, output slrcain. 2. The vertical 
line clw;ict<T (I) ih;U npF>eurs on .-i PC keyboard 
as ihe shift charncter on the backslash (\) key. 
3. In UNIX, a command function that transfers the 
output of one command to the input of a second 
conimanti. 

pipeline processing \pT[> Iin pros es eng\ A 
method of processing on :i computer tnat allows 
fast parallel processinj; of data, tins is aLcoiii- 
plished by overlapping opt^ration.s using a pipe, or 
a portion of memorv that pa.sses inff)niiatioii from 
one ijrcK-e.ss to another. .Sec parallel proee.ss- 
\ny pipe fdcTinition O pinclinini^ (dcfiniiion i5. 

pipelining \pip li nent;\ n. \. A methcKl of tett ti- 
mji and decodms; instructions Cpreproce.s.sinK) ui 
which, at any civcn nmc, .several program in.stmc- 
tiosis are in various stages of bcinj; fetched or 
decoded. Incailv. pipclininj; speeds e,\ecution 

have to wait lor tiisirut iioas. when 11 coinjilctes 




pcrforiTiinR a n.inicular tyi>e o( operation, ^. Ihe 
ii.se c>f pipes in oa.ssint; the output of one task as 
input to another until a desired .sct|iience <>i tasks 
has been carried out. St-i' tiLw pipe (definition I ,), 



piracy \pTr ,>-se\ n. 1. The theft of a computer 

design or picifjrain. 2. Lnauthonzed distribution 

and use of a computer prograni. 
.pit \dot-pit', dot l'-f T'N n. A file extension ff>r an 

archive file compressed witfi PacklT. See also 

PacklT. 

pitch \pich\ n. 1. A mea.sure, generally used witli 
monospace fonts, iliat de.scribes ilie number <;f 
charac ters tliat fit in a horizonml inch, ^'e iilso 
characters per inch. Compare pcAnl^ fdefinition 1). 
2. See .screen jiilch. 

pixel \piks'.)l\ n. Short for picture (pix) element. 
One spot in a rectilinear grid of thousands of such 
spots lliat are individually "painted" to form an 
image pri-xlut ed on the screen by a computer or 
on paper by a printer, A pixel i.s the smallest ele- 
ment that display or print hardware and .software 
can manipulate in creating letters, numbers, or 
graphics. See the iliustration.s. Ako called pel. 




IHxeL The letter A (top) la ac tuaUy made up nf 
a pattern of pixels in a gritl, a.i is the cnl .s (I'f 
( bottom). 

pixel iiiuige \piks ,)1 ini 0|\ n Hie [epji ;.eritat;oa 
of a color graphic in a computer s memory. A pixel 
imai^e is similar to a bit image, which also 
describt\s a screen jtraphic, but a pixel ima^e has 

an added dimension sometimes called depth, that 

pixel map \piks ol nr.ip \ 7i. A tlata slrutlure l;iat 
desciilx-s the pixel ini-iKe ot a i;ra[)hi( . in<.-|ud;ni{ 
such features as color, imajie. resolution, diinen 



The Indispensable 
PC Hardware Book 

FOURTH EDITION 

Hans-Peter Messmer 



Addison -Wesley 



PFAKSON FDUCATION LIMITED 



Head Office: 

H.irlow CM20 2jn 
Tel: f 44 (0)1279 r.23623 
1 ax: +44 (0)1279 431059 



London Office: 
128 LongAcrf 
Ltmdon WC2h 9AN' 
'Tel: +44 (0)20 7447 2000 
Fax: +44 (0)20 7240 5771 



WebsitL-^.: www.it.minds.com 
wwwaw.com/cscng/ 



First publlshod ill Groat Brit.iin in 2002 

0 Pearson Rducation Ltd 2002 

1 he Tifiht of Hans-PQter Mm-smer to be identified as Author of this Work has t^een as-sortud by him in 
accordance with the Copyright, Designs and Patents Act 1988. 

ISBN 0 20) 59616 4 

Brltifli library Cufulv^^um^ in Publication Data 

ACir catalogue record for this book can be obtained from the lirilish Library 

Uiirary ofCon^rcss Catalo^in^ in Puhliculioii Oala 
Me.ssiiit-r, Hans-PtHer, 

[PC-llandwarebuch. IZnglish] 

The indispensible I'C hardware book / 1 Cans-IVter Mossmer.-4th ed. 

p. cm. 
Includes index 

ISBN 0-201 -59616-4 (alk. paper) 

1. Computer input-output equipmi^nt. 2. Micrcvcomputcrs. 

TH 78887.5 .M4613 2001 
004.165-dc21 



All rigtits reserved, no part of this publication may be reproduced, stored in a retrieval system, 
or transmitted in any form or by any means, electronic, merhanical, photocopying, recording, or 
otherwise without either the prior written permission of the Publishers or a licence permitting 
rosfricfed copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, 90 Tottenhnm 
Court Road, l.<indi)n Wl P OLP. This book may not be lent, resold, hired out or otherwise disposed <5f by 
way of trade in any form ()f binding or covet other than that in which it is published, without the prior 
consent of the Publishers. 

I he programs in this book have been included for their ini>trurlional value. Hie publisher does not offer 
any warranties or representations in respect of their fitness for a particular purpose, nor does the publisher 
accept any liability for any loss or damagi: arising from their use. 

10 9 8 7 6 5 4 3 2 1 

Translated by TransScnpt Alba Ltd, Fdinburgh, Scotland. 
Typeset by Panlek Arts, Maidstone, Kent. 

Pnnted and bound in Great bril.iin by Biddies Ltd of Guildford and Kind's Lynn. 
r>ic 1'ubltt.iwri' iJcUctf i= to usf yajny tiuitnifaclun;! from .sysfflinafi/f /urfsfs. 



2001052081 



216 



Reduced instruction set and hardwired instructions 

Closely related to the dbbrcvi.ition RISC: is the reduction of the almost unlimited instruction set 
uf highly complex ClSCs. One of the first prototypes that implemented Iho RISC concept, the 
RISC /, had 31 instructions, whereas its successor, the RISC U, had 39, The simplicity of the 
processor structure is shown by the reduced number of integrated transistors: in tiie RISC 11 
there are only 41 000 (in comparison with more than 1 million in the 1186 and 3 million in the 
Pentium). What is also interesting is that the RISC prototypes already had an additional on-chip 
cache, which was larger than the actual processor. In the i486 the supporting units for the 
processor take up more space on the processor chip than the highly efficient Cl'U itself. 
One additional very important characteristic is that the instructions (or, put somewhat better, the 
hardwired Control Unit CU) arc hardwired. This means that in a RISC processor, the execution 
unit (EU) is no longer controlled by the CU with the assistance of extensive microcodes. Instead, 
the whole operation is achieved in the form of hardwired logic. This greatly speeds up the 
execution of an instruction. 

For example, in a CISC the complexity of a multiplication instnKtion is U>caled in a very extensive 
microcixle which controls the ALU. In contrast, for a RISC CPU the chip designers put the 
complexity in a complicated hardware multiplier. Typically, ii\ a CISC CPU mulliplicatiorvs are 
carried out by many additions and shifts, whereas a RISC multiplier performs that operation in one 
or two (dependent on the precision) passes. IXie to the reduced number of machine iastructions, 
there is now enough space on the chip for implementing such highly complex types of circuit. 

Instruction pipelining 

As a result of the basic principles on which microprocessors work, the execution stniclure of an 
instruction i<; the .same for the majority of machine ctxle instructions. The following steps must be 

- l^e.id the instruction from memory (bistruction fetching). 

- Decode the instruction (decoding phase). 

Wliere necessar>', fetch operand(s) (operand fetching phase). 

- Execute the instruction (execution phase). 

- Write back the result (write-back phase). 

ITie instruction is decoded during the decoding phase and, in most ca.ses, the operand addresses ore 
determined here. In a CISC processor this instruction step is performed by the bus interface and the 
prefetcher as S(x>n as there is enough space in the prefetch queue. Even the second step, the decoding 
of the instruction, is executed in the decoding unit prior to the instruction execution itself, thus the 
decoded microcode is available in the micrcKode queue. Tlie remaining three steps are executed by 
microtxide in tl\e exwution unit under the control of the CU. In normal circumstances, a single clock 
cycle is not sufficient, or the clock cycle rnustbe very long, thai is, the clock rate is very low. 
Machine instructioiis are very well suited for pipelinc\i execution. For comparison, let us look at 
address pipelining, which we have already met. In one complete bus cycle there are at le.ist two 
very independent sequential processes; memory addressing and data transfer. Pipelined 
a<idressing now means that the addressmg phase of the following bus cycle overlaps with the 



Chapter 8 All in One - The i486 



217 



data transfi'r phase of the current bus cycle. Application of this principle to instruction pipelining 
means that the nbovc-mentioncd five basic phases for successive instructions are each shifted by 

one stage relative to one another. 

Tho decisive factor for the success of instruction pipelining is not that an instrvittion is processed 
completely within one cycle but instead that au instruction is completed for every cycle. What 
at first appears as linguistically subtle has enormous consequences. Here, each executable 
instruction is divided into a set number of sub-steps, such that the processor executes every sub- 
step in a single stage of a pipeline in one single clock cycle. This achieves the intended aim: 
single cycle machine instructions. This means that ideally, each machine code instruction is 
executed within one processor dock cycle, or, put another way, only one clock cycle per 
instruction is necessary, thus clcKks per instruction (CPl) = 1. This is shown in Figure 8.6. 



8 5 
i2 S 



i 1 

B a 

Cycia n ^ j^^cHonjlnslrnctiOd 
CycIo n+1 _ 

Cycle n+2 j'"^'"||<="""|'"^ . ^ r, 

Cycle n.4->]' "-'^';^;°-|'^^^^^^^^^^ R, 



Figure S.6: limtnctim pipelining. Fxirh instruction is split into parts in the faie-imd pipeline so that it car he 
executpii. The parts arc executed within one clock cycle. Therefore, for example, although instruction k requires five 
complete cx/clcs far its execution, one instruction result is available for each cycle at the start of the pipeline. 



As you can clearly see from the figure, the processor commences v\'ith the execution of the nth 
instruction as soon as the (n -l)th instruction has left the first pipeline stage. In other words, the 
controller unit starts the instruction fetching phase for the nth instruction as soon as the (n-l)th 
insli-uction enters the decoding pha.se. In this example of a five-stage pipeline, under ideal 
circumstances, five instructions can be found in different execution phases. It can be 
optimistically assumed that a processor clock cycle (PCLK) is necessary per instruction phase 
and, therefore, an instruction is always executed within five clock cycles. As there are five 



218 



inslmctkms simultaneously in the pipeline, which are each displaced by one clcxk cycle (PCLK), 
nn instruction result is .ivailablo from the pipeline for each ckick cycle (that is, each step contains 
■in instruction in differing stages). Normally, a register is situated between the individual 
pipeline steps; it serves as the output register for the preceding pipeline step and, at the same 
time, as the input register for the following pipeline step- Ir\ comparison, without pipelining 
(as is normally the case with CISC processcirs), only the nth instruction is started, thus the 
instruction fetching phase of the nth instruction starts only after the (n-l)th instruction is 
completed - that is, after five clock cycles. Ideally, the overlapping of the instructions alone leads 
to ijKrcasc in speed by a factor of five (!) without the need for increasing the clock rate. 
The five-stage pipeline represented in Figure S.b is just an example. With some processors, the 
phases are combined into one single phase; for example, the decoding phasp and the operand 
fetching phase (which is closely linked to the decoding phase) may be executed in a single 
pipeline stage. Tl^e result would be a four-stage pipeline. On the other hand, the individual 
instruction phases can be sub-divided even hirther, until each clement has its own sub-phase. 
Thus, through simplicity, very quick pipeline stages can be implemented. This allows the clock 
rate to be increased. Such a strategy leads to a imperpipeiined architecture with many pipeline 
stages (ten or more). This superpipelined architecture allows the Alpha to achieve its speed of 300 
MHz and the Pentium and PentiumPro to achieve 200 MI Iz. 

Another possibility for increasing the performance of a RIEiC microprocessor is the integration of 
many pipelines operating in parallel. With this method, the result is a superscalar. One example is 
the Pentium with two parallel operating pipeline. I am sure you can imagme tliat tliis increases 
the complexity of co-ordinating the components with one another still further. Here, not only the 
individual pipeline stages have to co-operate but also the different piptlines themselves. 

Pipeline Interlocks 

You can recognize one serious problem for the implementation of instruction pipelirung, for 
example, with the two following instructions; 
AODeax, [ebx+ecxj 
MOV cdx, [eax-i-ecxl 

The value of the eax register for the address calculation of the second operand in the MOV 
instruction is only known after the execution phase of the ADD instruction. On the other hand, 
the MOV instruction can already be focmd in the decoding phase, where the operand addresses 
[eax+ecx] are generated, while the MOV instruction is still in the execution phase. At this time, 
the decoding level decoding phase cannot determine the operand address. The CPU control unit 
must recognize such data and register dependencies which lead tci pipeline interlocks, and react 
accordingly. The problem always appears when a following instruction n-t-l (or also n-t-2) in an 
earlier pipeline stage needs the result of the instruction n from a later stage. 
The simplest solution is to delay the operand calculation m the decoding phase by one clock 
cycle. The Uerkolcy RISC concept uses xoreboarding to deal with this pipeline obstruction. For 
this, a bit is attached to each processor register. Kor machine kiistructions that refer to a processor 
register, the bit is initially set by the control unit to show that the register value is not yet defined. 
The bit is removed only if the register is written to during the execution phase and its new 
content is valid. If a subsequent instruction wishes to use the register as an operand source, it 
checks whether the scoreboarding bit is set, that is, the content is undefined. If this is the case, the 



Glossary 



1239 



Inquiry cycle 

Also called snoop cycle. A bus cycle to a processor with an on-chip cache or to a cache controller 
to investigate whether a certain address is present in the applicable cache. 
Instruction pipelining 

Ck'nerally, instructions show very similar execution steps; for example, every instruction has to 
be fetched, decoded and executed, and the results need to be written back into the destination 
register. With instruction pipelining the execution of every instruction is separated into more 
elementary tasks. Each task is carried out by another stage of an instruction pipeline (ideally m 
one single clock cycle) so that, at a given time, several instructions are present in the pipeline at 
different stages m ciifferent execution states. Thus, not every instruction is executed completely 
in one clock cycle, but one instruction is completed every cltKk cycle. 
Intel 

An important US firm which manufactures microelectronic components, for example memory 

cliips and processors. Intel is regarded as the inventor of the microprocessor. 

Interlock 

If a stage in a pipeline needs the result or the system element of another stage which is not yet 
available, this is called an interlock. Interlocks arise, for example, if when calculating a composite 
expression the evaluation of the partial expressions is still in progress. The requesting pipeline 
stage then has to wait until the other pipeline stage has completed its calculations- 
Internet 

A worldwide net (WAN) which initially should enable data exchange between universities and 
research institutes. Meanwhile, any PC user who has access to a modem and a telephone line can 
access the Internet. 
Interrupt (software, hardware) 

A software interrupt is issued by an explicit interrupt instruchon INT; a hardware interrupt, however, 
is transmitted via an IRQ line to the processor. In both cases, the processor saves flags, instruchon 
pointer and code segment on the stack, and calls a specific procedure, the interrupt handler. 

Interr\ipt descriptor tabic 
Sec ID 1. 

Interrupt descriptor table register 

See ID TR. 

Interrupt gate 

A gate descriptor used to call an interrupt handler. Unlike a trap gate the interrupt gate clears the 
interrupt flag and therefore disables external interrupt requests. 

Interrupt handler 

See internipt 
I/O 

Abbreviation of input/output 
I/O-mappcd I/O 

With [/O-mapped I/O the a-gisters of peripherals are accessed via the I/O address space, that 
is, ports. 



1250 



Glossary 



PCI 

Abbreviation of peripheral component interconnect. A local bus standard initiated by Intel that 
usually has a bus width of usually 32 bits and operates at 33 MHz. A 64-bit version is intended 
wUh thc! forthconiinf; standard 2.0. Characteristic of PCI is the decoupling of processor and 

expansion bvis by orteans of a bridge. The traasfer rate reaches 133 Mb/s at 32 bits and 266 Mb/s 

at 64 bit."?; bursts are carried out with any length. 

PCMCIA 

Abbreviation of Personal Computer Memory Card International A.ssociation. An port for credit 

card-sized adapters which are inserted into a PCMCIA slot. 

Pentium 

A powerful member of the 80x86 familiy and successor of the i486. The outstanding characteristic 
is the superscalar architecture witli the two integer pipelines u and v. They can execute simple 
instructions in parallel, that is, complete two instructions within one clock cycle. An improved 
floating-point unit further enliajices performance. 
Pentium Pro 

Intel's newer processor generation with 32-bit technology on which all subsequent models are 
ba-'^ed (i.e. Pentium HI). The Pentium Pro integrates an L2 cache together with the CPU die in one 
single package. The cache runs through a dedicated L2 cache bus at the full CPU clock. 
Peripheral 

A device or unit located outside the system's CPU/RAM. 
PGA 

Abbreviation of pin grid array. A package where the terminals are provided in the form of pins 
at the bottom of the package. 
Physical address space 

The number of physically addressable bytes, determined by the number of address lines of a 

processor or the amount of installed memory. 

PIC 

Abbreviation of programmable interrupt controller A chip used to manageme several hardware 
interrupts and the ordered transfer of the requests to a CPU which usually has only one input for 
this type of interrupt request. Thus the PIC acts as a multiplexor for hardware interrupts. 

PIO 

Abbreviation of programn\ed I/O. With PIO data is exchanged between the KAM and a 
peripheral not by means of DMA, but with IN and OUT instructions via the CPU. 
Pipeline stage 

A unit or stage within a pipeline which executes a certain partial task. A pipeline for a memory 
access may include the four pipeline stages address calculation, address supply, reading the 
value and storing the value in a register. An instruction pipeline comprises, for example, the 
stages instruction fetch, instruction decode, execution and register write-back. 
Pipelining 

Starting the execution of a function of the next cycle before the function of the current cycle has 
been completed. For example, the 80286 provides the address for the next read cycle in advance 



Glossary 



1251 



of roctnving ilie data of the current iTyde. This is called address pipcli-ing or pipelined 
iiddressing. Similarly, a processor Can start the execution of parts of a complex instruction in an 
early pipeline stage before the preceding instruction has been completed in the last pipeline stage. 

FIT 

Abbreviation of programmable interval timer. A chip which outputs a pulse as soon as a 
programmed lime period has elapsed. In the original PC designs you will find the 8253 or its 
SiiCfOS.sor, the 8254. 
Pixel 

Short form of picture element; a point on a monitor. Usually the name pixel is only used in 
graphics mode. The pixel may be allocated one or more hits which define the cOlour and 

brightness of the picture clement. 
PLA 

Abbreviation of programmable logic array. A highly-integrated chip with logic gates which is 
used as an ASIC, and whose logic can be freely-programmed during manufacturing or by the 
u.ser. A PLA u,suatly has a field of AND gates and a field of OR gates. AND and OR can be 
combined to achieve any logical combinatio. This is similar to the fact that all iwtural numbers 
can be generated with 0 and 1. 
PLCC 

Abbreviation of plastics leaded chip carrier. A type of case where the contacts are sited on all of 

the four sides. 

Plug&Play 

A standard in which new (compliant) hardware that is connected to, or integrated in, a PC is 
automatically recognised, set up and configured when the PC is booted. 

PMOS 

Abbreviation of P-channel MOS. A technology used to manufactiire MOS tran.sistors where the 

channel conducHvity is based on positively charged holes. 

Polarization 

If the electric or magnetic field of an electromagnetic wave is oscillating in one direction only, the 
wave is linearly polarized. The direction of the magnetic field is called the polarization direction. 

Polarization filter 

A device used to separate part of a specific polarization direction from an electromagnetic wave. 
Only that part whose polarization direction coincides with the polarization direction of the filter 
pas-ies through the filter. 

Port 

An address in the CPU's I/O address space. Usually, registers in peripherals are accessed via ports. 
Positioning time 

The hme period between an instruction to position the read/write head and the head being 

moved to the indicated track. 

POST 

Abbrwiation of power-on self test. A program in ROM which delects and checks all installed 
components during power-on. You can use a specially-designed expansion card to display these 
codes which is c>specially helpful when trying to delect the source of errors. 



